Skip to content

Clean-up DeviceContext (Remove shadowed data structure) #1962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 9, 2022
Merged

Clean-up DeviceContext (Remove shadowed data structure) #1962

merged 6 commits into from
Feb 9, 2022

Conversation

umariqbal-rs
Copy link
Contributor

Description

In this PR we will Clean-up DeviceContext (Remove shadowed data structure).

Related Issue

Motivation and Context

How Has This Been Tested?

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@github-actions github-actions bot added the VPR VPR FPGA Placement & Routing Tool label Jan 31, 2022
Copy link
Contributor

@tangxifan tangxifan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umariqbal-rs I review the codes. For all the client functions, please use RRGraphView instead of RRGraphBuilder when accessing data

@@ -126,7 +126,7 @@ bool read_route(const char* route_file, const t_router_opts& router_opts, bool v
recompute_occupancy_from_scratch();

/* Note: This pres_fac is not necessarily correct since it isn't the first routing iteration*/
OveruseInfo overuse_info(device_ctx.rr_nodes.size());
OveruseInfo overuse_info(device_ctx.rr_graph_builder.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For overuse report, just need to use RRGraphView

@@ -163,7 +163,7 @@ struct DeviceContext : public Context {
/* A writeable view of routing resource graph to be the ONLY database
* for routing resource graph builder functions.
*/
RRGraphBuilder rr_graph_builder{&rr_nodes, &rr_node_metadata, &rr_edge_metadata};
RRGraphBuilder rr_graph_builder{&rr_node_metadata, &rr_edge_metadata};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also delete the line 150 in this file

@@ -273,6 +273,10 @@ class RRGraphBuilder {
inline void init_fan_in() {
node_storage_.init_fan_in();
}
/** @brief Return number of nodes. This function is inlined for runtime optimization. */
inline size_t size() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the API name num_nodes() here, we can also change the API name in RRGraphView

@@ -929,7 +929,7 @@ void alloc_draw_structs(const t_arch* arch) {
/* Space is allocated for draw_rr_node but not initialized because we do *
* not yet know information about the routing resources. */
draw_state->draw_rr_node = (t_draw_rr_node*)vtr::malloc(
device_ctx.rr_nodes.size() * sizeof(t_draw_rr_node));
device_ctx.rr_graph_builder.size() * sizeof(t_draw_rr_node));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use RRGraphView here

@@ -980,7 +980,7 @@ void init_draw_coords(float width_val) {
return; //do not initialize only if --disp off and --save_graphics off
/* Each time routing is on screen, need to reallocate the color of each *
* rr_node, as the number of rr_nodes may change. */
if (device_ctx.rr_nodes.size() != 0) {
if (device_ctx.rr_graph_builder.size() != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use RRGraphView here

@@ -322,7 +322,7 @@ bool try_timing_driven_route_tmpl(const t_router_opts& router_opts,
*/
bool routing_is_successful = false;
WirelengthInfo wirelength_info;
OveruseInfo overuse_info(device_ctx.rr_nodes.size());
OveruseInfo overuse_info(device_ctx.rr_graph_builder.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use RRGraphView here

@@ -80,7 +80,7 @@ bool alloc_route_tree_timing_structs(bool exists_ok) {
/* Allocates any structures needed to build the routing trees. */

auto& device_ctx = g_vpr_ctx.device();
bool route_tree_structs_are_allocated = (rr_node_to_rt_node.size() == size_t(device_ctx.rr_nodes.size())
bool route_tree_structs_are_allocated = (rr_node_to_rt_node.size() == size_t(device_ctx.rr_graph_builder.size())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use RRGraphView here

@@ -100,7 +100,7 @@ std::vector<float> calculate_all_path_delays_from_rr_node(int src_rr_node, const
auto& device_ctx = g_vpr_ctx.device();
auto& routing_ctx = g_vpr_ctx.mutable_routing();

std::vector<float> path_delays_to(device_ctx.rr_nodes.size(), std::numeric_limits<float>::quiet_NaN());
std::vector<float> path_delays_to(device_ctx.rr_graph_builder.size(), std::numeric_limits<float>::quiet_NaN());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use RRGraphView here

@@ -438,8 +438,8 @@ void ExtendedMapLookahead::compute(const std::vector<t_segment_inf>& segment_inf
util::RoutingCosts delay_costs;
util::RoutingCosts base_costs;
int total_path_count = 0;
std::vector<bool> node_expanded(device_ctx.rr_nodes.size());
std::vector<util::Search_Path> paths(device_ctx.rr_nodes.size());
std::vector<bool> node_expanded(device_ctx.rr_graph_builder.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use RRGraphView here

@@ -588,7 +588,7 @@ static void run_dijkstra(RRNodeId start_node, int start_x, int start_y, t_routin
const auto& rr_graph = device_ctx.rr_graph;

auto& node_expanded = data->node_expanded;
node_expanded.resize(device_ctx.rr_nodes.size());
node_expanded.resize(device_ctx.rr_graph_builder.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use RRGraphView here

@umariqbal-rs
Copy link
Contributor Author

@tangxifan
Can you Have a look on my PR?
Thanks

Copy link
Contributor

@tangxifan tangxifan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umariqbal-rs CI is red on basic tests. Can you look into it?

@@ -78,7 +78,7 @@ size_t grid_to_bin_y(size_t grid_y, const SpatialRouteTreeLookup& spatial_lookup
bool validate_route_tree_spatial_lookup(t_rt_node* rt_node, const SpatialRouteTreeLookup& spatial_lookup) {
auto& device_ctx = g_vpr_ctx.device();
const auto& rr_graph = device_ctx.rr_graph;
auto& rr_node = device_ctx.rr_nodes[rt_node->inode];
auto& rr_node = rr_graph.rr_nodes()[rt_node->inode];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have finished all the APIs, you can consider to replace this line by

RRNodeId rr_node = rt_node->inode;

And also adapt the rest codes in this function, rr_node.id() -> rr_node

@@ -33,7 +33,7 @@ void update_route_tree_spatial_lookup_recur(t_rt_node* rt_node, SpatialRouteTree
auto& device_ctx = g_vpr_ctx.device();
const auto& rr_graph = device_ctx.rr_graph;

auto& rr_node = device_ctx.rr_nodes[rt_node->inode];
auto& rr_node = rr_graph.rr_nodes()[rt_node->inode];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have finished all the APIs, you can consider to replace this line by

RRNodeId rr_node = rt_node->inode;

And also adapt the rest codes in this function, rr_node.id() -> rr_node

@@ -451,7 +451,7 @@ static void load_rr_indexed_data_T_values() {
static void calculate_average_switch(int inode, double& avg_switch_R, double& avg_switch_T, double& avg_switch_Cinternal, int& num_switches, short& buffered, vtr::vector<RRNodeId, std::vector<RREdgeId>>& fan_in_list) {
auto& device_ctx = g_vpr_ctx.device();
const auto& rr_graph = device_ctx.rr_graph;
const auto& rr_nodes = device_ctx.rr_nodes.view();
const auto& rr_nodes = rr_graph.rr_nodes().view();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we still need a view here. Can you try to delete this line? For LINE 466, you can use the rr_graph.
If it does work, we keep this line.

// Returns a range of RREdgeId's belonging to RRNodeId id.
//
// If this range is empty, then RRNodeId id has no edges.
inline vtr::StrongIdRange<RREdgeId> edge_range(const RRNodeId id) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const RRNodeId -> RRNodeId
It will be more memory efficient.

/* -- Internal data storage -- */
/* Note: only read-only object or data structures are allowed!!! */
private:
/* node-level storage including edge storages */
const t_rr_graph_storage& node_storage_;
/* Fast look-up for rr nodes */
const RRSpatialLookup& node_lookup_;

const vtr::vector<RREdgeId, RRNodeId> edge_dest_node_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why these two data are required?

@@ -505,9 +505,9 @@ static bool check_rr_graph_connectivity(RRNodeId prev_node, RRNodeId node) {
if (prev_node == node) return false;

auto& device_ctx = g_vpr_ctx.device();
const auto& rr_graph = device_ctx.rr_nodes;
const auto& rr_graph = device_ctx.rr_graph;
/*TODO We need to remove temp_rr_graph once rr_graph is resolved*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remove these two lines.

@@ -147,7 +147,7 @@ struct DeviceContext : public Context {
/*
* Structures to define the routing architecture of the FPGA.
*/
t_rr_graph_storage rr_nodes; // autogenerated in build_rr_graph
//t_rr_graph_storage rr_nodes; // autogenerated in build_rr_graph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line

/** @brief Get the destination node for the iedge'th edge from specified RRNodeId.
* This method should generally not be used, and instead first_edge and
* last_edge should be used.*/
inline RRNodeId edge_sink_node(RRNodeId id, t_edge_size iedge) const {
return node_storage_.edge_sink_node(id, iedge);
}
inline RRNodeId edge_sink_node(RREdgeId edge) const {
return edge_dest_node_[edge];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tangxifan
We are using it here const vtr::vector<RREdgeId, RRNodeId> edge_dest_node_;.
to replace here.

auto switch_id = rr_graph.edge_switch(prev_edge);

@@ -568,7 +568,7 @@ static t_trace_branch traceback_branch(int node, int target_net_pin_index, std::
t_trace* prev_ptr = alloc_trace_data();
prev_ptr->index = inode;
prev_ptr->net_pin_index = OPEN; //Net pin index is invalid for Non-SINK nodes
prev_ptr->iswitch = device_ctx.rr_nodes.edge_switch(iedge);
prev_ptr->iswitch = rr_graph.edge_switch(iedge);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does rr_graph.nodes().edge_switch(iedge) work here? If we can reuse, it is better to reuse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tangxifan
I am facing some unknown errors for failing of basics tests. I will see it now.

@@ -2288,7 +2288,7 @@ static void draw_rr_pin(int inode, const ezgl::color& color, ezgl::renderer* g)
* the physical pin is on. */
void draw_get_rr_pin_coords(int inode, float* xcen, float* ycen, const e_side& pin_side) {
auto& device_ctx = g_vpr_ctx.device();
draw_get_rr_pin_coords(device_ctx.rr_nodes[inode], xcen, ycen, pin_side);
draw_get_rr_pin_coords(device_ctx.rr_graph.rr_nodes()[inode], xcen, ycen, pin_side);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tangxifan
Can you Please Have a look on this line ?

Copy link
Contributor

@tangxifan tangxifan Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umariqbal-rs This line looks o.k. We will refactor the functions in the draw.cpp later. Ideally, we want these function to use rr_graph and RRNodeId, rather than sending a rr_node data structure.

@github-actions github-actions bot removed the VPR VPR FPGA Placement & Routing Tool label Feb 2, 2022
@umariqbal-rs
Copy link
Contributor Author

@tangxifan
Can you Please have a look my PR?

@umariqbal-rs umariqbal-rs reopened this Feb 2, 2022
@github-actions github-actions bot added the VPR VPR FPGA Placement & Routing Tool label Feb 2, 2022
Copy link
Contributor

@tangxifan tangxifan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umariqbal-rs Code changes are clean! Thanks for the hard working. Can you start running QoR tests?

@umariqbal-rs
Copy link
Contributor Author

umariqbal-rs commented Feb 3, 2022

@tangxifan
Below are the QoR Results with Current master:
vtr_reg_qor_chain_depop test:
Flow run time 0.04% on average.
critical path delay 0% on average.
Peak memory usage 0.01% on average
titan_quick_qor test:
flow run time 14.48% on average.
critical path delay 0% on average.
Peak memory usage 0% on average
vtr_reg_qor_chain_depop.xlsx
titan_quic_qor.xlsx

@umariqbal-rs
Copy link
Contributor Author

@tangxifan
Below are the QoR Results with Sanity:
vtr_reg_qor_chain_depop test:
Flow run time -2.11% on average.
titan_quick_qor test:
flow run time 10.9% on average.

@@ -419,6 +429,10 @@ class RRGraphView {
const RRSpatialLookup& node_lookup() const {
return node_lookup_;
}
/** @brief Return the node-level storage structure for queries from client functions */
const t_rr_graph_storage& rr_nodes() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware of the runtime overhead. Can you try to make this API inline?

@tangxifan
Copy link
Contributor

@umariqbal-rs I see the runtime overhead is mainly on packer (+25% on average), while placer is next (+9% on average) and router is least (+7%)

@tangxifan
Copy link
Contributor

@umariqbal-rs

@vaughnbetz and I have gone through the spreadsheets. We see a constant overhead on runtime across all the Titan benchmarks. To help us understand why, can you try

  • Use an isolated machine where we can avoid the server workload issues
  • Try 1 or 2 benchmarks from Titan benchmarks on the selected machine
  • Check if you can still see any runtime difference (if we see the runtime difference in packer insists, we should pay attention).

Before this happen, ensure

  • We have inline the APIs

@umariqbal-rs
Copy link
Contributor Author

@tangxifan
Sure I will do this. I will do this in evening time when load is less on server.

@tangxifan
Copy link
Contributor

@umariqbal-rs Remember to resolve the merging conflicts

@umariqbal-rs
Copy link
Contributor Author

umariqbal-rs commented Feb 8, 2022

@tangxifan
Below are the QoR Results with Current master:
vtr_reg_qor_chain_depop test:
Flow run time 0.84% on average.
critical path delay 0% on average.
Peak memory usage 0% on average
titan_quick_qor test:
flow run time -0.57% on average.
critical path delay 0% on average.
Peak memory usage 0% on average
vtr_reg_chain_depop.xlsx
titan_quick_qor_master.xlsx

Copy link
Contributor

@tangxifan tangxifan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umariqbal-rs I see some changes in the download_titan.py script. Is it intended?

@umariqbal-rs
Copy link
Contributor Author

@umariqbal-rs I see some changes in the download_titan.py script. Is it intended?

@tangxifan
Yes Actually for running titan_tests I have previously used down_titan.py file which you provided.

@tangxifan
Copy link
Contributor

@umariqbal-rs I see some changes in the download_titan.py script. Is it intended?

@tangxifan Yes Actually for running titan_tests I have previously used down_titan.py file which you provided.

Emm. Weird. I cannot see this file in the change list. If the down_titan.py file is even with the current master. It should be o.k. then

@umariqbal-rs
Copy link
Contributor Author

umariqbal-rs commented Feb 8, 2022

@umariqbal-rs I see some changes in the download_titan.py script. Is it intended?

@tangxifan Yes Actually for running titan_tests I have previously used down_titan.py file which you provided.

Emm. Weird. I cannot see this file in the change list. If the down_titan.py file is even with the current master. It should be o.k. then
@tangxifan
I had used download_titan.py that you had sent us earlier to run test locally and pushed it on github, now I have restored download_titan.py.

@umariqbal-rs
Copy link
Contributor Author

@tangxifan
Kokoro tests are not running on PR.

@tangxifan
Copy link
Contributor

@tangxifan Kokoro tests are not running on PR.

No worries. We have logged this in #1965

Copy link
Contributor

@tangxifan tangxifan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umariqbal-rs Thanks for the hard working. It seems that the inline keyword impacts a lot here.

@vaughnbetz I propose to merge. QoR tests have been rerun on an isolated machine which shows no performance degradation. If you see anything wrong, feel free to comment.

@tangxifan tangxifan merged commit 66ffeaa into verilog-to-routing:master Feb 9, 2022
@tangxifan tangxifan deleted the rr_nodes branch February 9, 2022 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants